Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race to unregister sync session #5886

Merged
merged 12 commits into from
Sep 30, 2022

Conversation

danieltabacaru
Copy link
Collaborator

@danieltabacaru danieltabacaru commented Sep 23, 2022

What, How & Why?

There is a race when a sync session is closed at the same time with being detached from the SyncManager (as result of the app being destroyed).
This can lead to a sync session being unregistered from the SyncManager after the SyncManager was deallocated (use-after-free).

To prevent this happening, we extend the lifetime of the SyncManager (if the SyncManager is still alive) until the session is unregistered.

Fixes #5752, realm/realm-dart#858.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!
It can be difficult to write a test to hit races like this, but could you add a test anyways even if it doesn't reproduce 100% of the time?

@ironage ironage requested a review from tgoyne September 23, 2022 17:14
@danieltabacaru
Copy link
Collaborator Author

Great catch! It can be difficult to write a test to hit races like this, but could you add a test anyways even if it doesn't reproduce 100% of the time?

It is indeed pretty hard to hit all conditions to create a good test for this, but I'll give it a try.

@tgoyne
Copy link
Member

tgoyne commented Sep 26, 2022

I don't think this fixes the race condition. As soon as close() releases the lock, a true value stored in needs_to_unregister is stale, and the following sequencing will result in a UAF:

  1. Thread 1: auto& sync_manager = *m_sync_manager; forms a reference to a non-null value
  2. Thread 1: auto needs_to_unregister = !m_detaching_from_sync_manager; reads m_detaching_from_sync_manager and sets needs_to_unregister to true
  3. Thread 1: m_state_mutex.unlock(lock); releases lock
  4. Thread 2: detach_from_sync_manager() acquires lock
  5. Thread 2: m_detaching_from_sync_manager = true
  6. Thread 2: detach_from_sync_manager() releases lock
  7. Thread 2: shutdown_and_wait()
  8. Thread 2: detach_from_sync_manager() acquires lock
  9. Thread 2: m_sync_manager = nullptr;
  10. Thread 2: Returns from detach_from_sync_manager() and the sync manager is deallocated
  11. Thread 1: Checks needs_to_unregister and sees that it's true
  12. Thread 1: sync_manager.unregister_session(m_db->get_path()) calls the now-deallocated sync manager

This requires much more unfavorable scheduling than before, but the race condition is still present.

@danieltabacaru
Copy link
Collaborator Author

Good catch. I think one way to solve the problem once and for all is to only access m_sync_manager under m_state_mutex. The problem is that it will deadlock in this case https://github.com/realm/realm-core/blob/master/src/realm/object-store/sync/sync_manager.cpp#L709-L716. To avoid that I can make m_state_mutex into a std::recursive_mutex (potentially with a CheckedRecursiveMutex and a lock that works with recursive mutexes). What do you think @tgoyne?

@tgoyne
Copy link
Member

tgoyne commented Sep 29, 2022

Recursive mutexes are an extremely bad idea that should almost never be used.

@danieltabacaru
Copy link
Collaborator Author

danieltabacaru commented Sep 29, 2022

Recursive mutexes are an extremely bad idea that should almost never be used.

I agree, but I don't see any other obvious solution. My last commit shows how it would work.

On a side note, we have a project to refactor/rework this part of the codebase, but it would be nice find a fix until then.

@tgoyne
Copy link
Member

tgoyne commented Sep 29, 2022

I think the simplest fix is probably for everywhere that's capturing a reference to the sync manager and then unlocking to instead store a shared_ptr and then unlock.

@danieltabacaru
Copy link
Collaborator Author

I think the simplest fix is probably for everywhere that's capturing a reference to the sync manager and then unlocking to instead store a shared_ptr and then unlock.

Thanks for the suggestion. I'm personally not a big fan of this approach (and we use it in a few other places), but it's good enough for the time being. We'll hopefully get rid of some of these once we refactor the code. I added a unit test to cover the original issue.

@danieltabacaru danieltabacaru merged commit 7b08700 into master Sep 30, 2022
@danieltabacaru danieltabacaru deleted the dt/fix_race_to_unregister_sync_session branch September 30, 2022 10:41
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in SyncManager::unregister_session
3 participants